Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PR]: Update documentation on temporal averaging, usage of bounds, and generation of weights #601

Merged
merged 6 commits into from
Feb 21, 2024

Conversation

tomvothecoder
Copy link
Collaborator

@tomvothecoder tomvothecoder commented Feb 6, 2024

Description

  • Closes [Doc]: Add more info about how temporal averaging is performed and how weights are generated #600
  • Weights are determined from the difference in the bounds ) -- already done in docstrings
    • xcdat/xcdat/temporal.py

      Lines 170 to 189 in fbf1db6

      Time bounds are used for inferring the time series frequency and for
      generating weights (refer to the ``weighted`` parameter documentation
      below).
      Parameters
      ----------
      data_var: str
      The key of the data variable for calculating averages
      weighted : bool, optional
      Calculate averages using weights, by default True.
      Weights are calculated by first determining the length of time for
      each coordinate point using the difference of its upper and lower
      bounds. The time lengths are grouped, then each time length is
      divided by the total sum of the time lengths to get the weight of
      each coordinate point.
      The weight of masked (missing) data is excluded when averages are
      taken. This is the same as giving them a weight of 0.
  • Data is grouped into the labelled time point for averaging operations
  • If one time point spans across the time intervals that you are averaging into, then weights are not properly assigned
  • get_time_bounds generally assumes data with frequencies of annual, monthly, daily, or sub-daily (I thought of this while writing this issue)

Checklist

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • My changes generate no new warnings
  • Any dependent changes have been merged and published in downstream modules

If applicable:

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass with my changes (locally and CI/CD build)
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have noted that this is a breaking change for a major release (fix or feature that would cause existing functionality to not work as expected)

@tomvothecoder
Copy link
Collaborator Author

Hey @pochedls, this small PR is ready for review whenever you have time.

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added "Warning:" to some comments.

xcdat/temporal.py Outdated Show resolved Hide resolved
xcdat/temporal.py Outdated Show resolved Hide resolved
xcdat/temporal.py Outdated Show resolved Hide resolved
xcdat/temporal.py Outdated Show resolved Hide resolved
Copy link

codecov bot commented Feb 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (94271f1) 100.00% compared to head (99327ff) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #601   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           15        15           
  Lines         1602      1602           
=========================================
  Hits          1602      1602           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tomvothecoder tomvothecoder added type: docs Updates to documentation and removed Type: Documentation labels Feb 12, 2024
@tomvothecoder tomvothecoder changed the title Update documentation on temporal averaging, usage of bounds, and generation of weights [Pull Request] Update documentation on temporal averaging, usage of bounds, and generation of weights Feb 12, 2024
@tomvothecoder tomvothecoder changed the title [Pull Request] Update documentation on temporal averaging, usage of bounds, and generation of weights [PR] Update documentation on temporal averaging, usage of bounds, and generation of weights Feb 12, 2024
@tomvothecoder tomvothecoder changed the title [PR] Update documentation on temporal averaging, usage of bounds, and generation of weights [PR]: Update documentation on temporal averaging, usage of bounds, and generation of weights Feb 12, 2024
Copy link
Collaborator

@pochedls pochedls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder – thanks for taking this on. I've made some suggestions for you to consider.

xcdat/bounds.py Outdated Show resolved Hide resolved
xcdat/bounds.py Outdated
Comment on lines 332 to 335
This method general assumes data with time frequencies of annual,
monthly, daily, or sub-daily. It loops over the time axis coordinate
variables and attempts to add bounds for each of them if they don't
exist.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not always true, since the specified method can be "midpoint." Suggest adding text similar to the text in add_missing_bounds where the freq optional argument is documented in the docstring.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm here...could a space be added between the freq and daily_subfreq arguments in the documentation (for consistency)?

Copy link
Collaborator Author

@tomvothecoder tomvothecoder Feb 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll copy the text from add_missing_bounds() here.

Typically numpy style docstrings have no spaces in between parameters. I think I added spaces after parameters that use bullet points in the description because it wasn't rendering lists correctly. I removed spaces and the docs still seem to render fine.

I'll go through docstrings and remove spaces if they used for parameters.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomvothecoder – I don't have strong feelings about this. I just noticed it wasn't consistent (but now I understand why).

Comment on lines 191 to 192
Warning: If one time point spans across the time intervals that
you are averaging into, then weights are not properly assigned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm debating if this should go here or a Note below. Suggest the following text.

Warning: When using weighted averages, the weights are assigned based on the timepoint value. For example, a time point of 2020-06-15 with bounds (2020-06-01, 2020-06-30) has 30 days of weight assigned to June, 2020 (e.g., for an annual average calculation). This would be expected behavior, but it's possible that data could span across typical temporal boundaries. For example, a time point of 2020-06-01 with bounds (2020-05-16, 2020-06-15) would have 30 days of weight, but this weight would be assigned to June, 2020, which would be incorrect (15 days of weight should be assigned to May and 15 days of weight should be assigned to June). This issue could plausibly arise when using pentad data.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also make sure new lines are used consistently between docstring parameter entries (here and elsewhere).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this change and added your suggestion to the Notes section.

Comment on lines 411 to 413

Warning: If one time point spans across the time intervals that
you are averaging into, then weights are not properly assigned.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere – should this be a longer entry at the bottom of the docstring (like the example above). Maybe we could also include a brief note, like:

"Note that weights are assigned by the labelled time point. If the dataset includes timepoints that span across typical boundaries (e.g., a timepoint on 2020-06-01 with bounds that begin in May 2020 and end in June 2020), the weights will not be assigned properly. See explanation below."

Copy link
Collaborator Author

@tomvothecoder tomvothecoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pochedls I think I addressed your suggestions. Let me know if I'm missing anything.

xcdat/temporal.py Outdated Show resolved Hide resolved
xcdat/temporal.py Outdated Show resolved Hide resolved
tomvothecoder and others added 6 commits February 21, 2024 09:44
- Add more specific information on how temporal averaging grouping is performed
- Add information on assumed time frequencies for `get_time_bounds()`
- Add info about how weights are not properly assigned if one time point spans across the time interval being averaged into
Co-authored-by: Stephen Po-Chedley <pochedley@gmail.com>
@tomvothecoder tomvothecoder merged commit 78dedf0 into main Feb 21, 2024
9 checks passed
@tomvothecoder tomvothecoder deleted the doc/600-temp-avg-info branch February 21, 2024 17:51
@tomvothecoder tomvothecoder restored the doc/600-temp-avg-info branch February 21, 2024 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: docs Updates to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Doc]: Add more info about how temporal averaging is performed and how weights are generated
2 participants